Dynamic optimization for removal of strong atomicity barriers

ABSTRACT

A method and apparatus for dynamic optimization of strong atomicity barriers is herein described. During runtime compilation, code including non-transactional memory accesses that are to conflict with transactional memory accesses is patched to insert transactional barriers at the conflicting non-transactional memory accesses to ensure isolation and strong atomicity. However, barriers are omitted or removed from non-transactional memory accesses that do not conflict with transactional memory accesses to reduce barrier execution overhead.

RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S. patentapplication Ser. No. 12/142,102 entitled “DYNAMIC OPTIMIZATION FORREMOVAL OF STRONG ATOMICITY BARRIERS” filed on Jun. 19, 2008; thisapplication is entirely incorporated by reference.

FIELD

This invention relates to the field of processor execution and, inparticular, to execution of groups of instructions.

BACKGROUND

Advances in semi-conductor processing and logic design have permitted anincrease in the amount of logic that may be present on integratedcircuit devices. As a result, computer system configurations haveevolved from a single or multiple integrated circuits in a system tomultiple cores and multiple logical processors present on individualintegrated circuits. A processor or integrated circuit typicallycomprises a single processor die, where the processor die may includeany number of cores or logical processors.

The ever increasing number of cores and logical processors on integratedcircuits enables more software threads to be concurrently executed.However, the increase in the number of software threads that may beexecuted simultaneously have created problems with synchronizing datashared among the software threads. One common solution to accessingshared data in multiple core or multiple logical processor systemscomprises the use of locks to guarantee mutual exclusion across multipleaccesses to shared data. However, the ever increasing ability to executemultiple software threads potentially results in false contention and aserialization of execution.

For example, consider a hash table holding shared data. With a locksystem, a programmer may lock the entire hash table, allowing one threadto access the entire hash table. However, throughput and performance ofother threads is potentially adversely affected, as they are unable toaccess any entries in the hash table, until the lock is released.Alternatively, each entry in the hash table may be locked. Either way,after extrapolating this simple example into a large scalable program,it is apparent that the complexity of lock contention, serialization,fine-grain synchronization, and deadlock avoidance become extremelycumbersome burdens for programmers.

Another recent data synchronization technique includes the use oftransactional memory (TM). Often transactional execution includesexecuting a grouping of a plurality of micro-operations, operations, orinstructions. In the example above, both threads execute within the hashtable, and their memory accesses are monitored/tracked. If both threadsaccess/alter the same entry, conflict resolution may be performed toensure data validity. One type of transactional execution includes aSoftware Transactional Memory (STM), where tracking of memory accesses,conflict resolution, abort tasks, and other transactional tasks areperformed in software.

In weakly atomic transactional memory systems, only transactionalaccesses are isolated from each other. In such systems non-transactionalmemory accesses are not tracked and, thus, do not incur any additionaltransactional overhead. However, weakly atomic systems do not providegeneral isolation and ordering guarantees for programs that mixtransactional and non-transactional accesses to the same data which maypotentially lead, in some cases, to incorrect execution as a result ofconflicting transactional and non-transactional accesses that are notisolated from one another.

In contrast, in strongly atomic transactional memory systems, to ensureruntime conflicts between transactional memory operations andnon-transactional memory operations do not occur, compilers treat eachnon-transactional memory operation as a single operation transaction. Inother words, transactional barriers are inserted at non-transactionalmemory accesses to isolate transactions from these non-transactionalmemory accesses. Here, the potential incorrect execution due toconflicts between transactional and non-transactional accesses isavoided; yet, execution of transactional barriers at everynon-transactional memory operation potentially wastes execution cycles.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not intendedto be limited by the figures of the accompanying drawings.

FIG. 1 illustrates an embodiment of a processor including multipleprocessing elements capable of executing multiple software threads.

FIG. 2 illustrates an embodiment of structures to support transactionalexecution.

FIG. 3 illustrates an embodiment of a flowchart for a method ofproviding optimized strong atomicity in transactional systems.

FIG. 4 illustrates an embodiment of a flow chart for a method ofoptimizing barriers for a strong atomicity transactional memory system.

FIG. 5 a illustrates another embodiment of a flowchart for a method ofoptimizing barriers for a strong atomicity transactional memory systemupon encountering a non-transactional memory access.

FIG. 5 b illustrates another embodiment of a flowchart for a method ofoptimizing barriers for a strong atomicity transactional memory systemupon encountering a transactional memory access.

FIG. 6 illustrates an illustrative embodiment of an access table tosupport optimization of transactional barriers for exemplary code.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forthsuch as examples of specific hardware/software support for transactionalexecution, specific shared memory access tracking, specificlocking/versioning/meta-data methods, specific types of local/memory inprocessors, and specific types of memory accesses and locations, etc. inorder to provide a thorough understanding of the present invention. Itwill be apparent, however, to one skilled in the art that these specificdetails need not be employed to practice the present invention. In otherinstances, well known components or methods, such as coding oftransactions in software, demarcation of transactions, specific andalternative multi-core and multi-threaded processor architectures,transaction hardware, cache organizations, specific compilermethods/implementations, and specific operational details ofmicroprocessors, have not been described in detail in order to avoidunnecessarily obscuring the present invention.

The method and apparatus described herein are for providing dynamicallyoptimized barriers for strong atomicity in code. Specifically, dynamicoptimization of barriers is primarily discussed in reference to anillustrative Software Transactional Memory system (STM). However, themethods and apparatus for optimizing barriers for strong atomicity arenot so limited, as they may be implemented in associated with anytransactional memory system.

Referring to FIG. 1, an embodiment of a processor capable of bothexecution of code to dynamically optimize barriers for strong atomicityand execution of optimized strong atomicity code is illustrated.Processor 100 includes any processor, such as a micro-processor, anembedded processor, a digital signal processor (DSP), a networkprocessor, or other device to execute code. Processor 100 includes aplurality of processing elements.

In one embodiment, a processing element refers to a thread unit, aprocess unit, a context, a logical processor, a hardware thread, a core,and/or any other element, which is capable of holding a state for aprocessor, such as an execution state or architectural state. In otherwords, a processing element, in one embodiment, refers to any hardwarecapable of being independently associated with code, such as a softwarethread, operating system, application, or other code. A physicalprocessor typically refers to an integrated circuit, which potentiallyincludes any number of other processing elements, such as cores orhardware threads.

A core often refers to logic located on an integrated circuit capable ofmaintaining an independent architectural state wherein eachindependently maintained architectural state is associated with at leastsome dedicated execution resources. In contrast to cores, a hardwarethread typically refers to any logic located on an integrated circuitcapable of maintaining an independent architectural state wherein theindependently maintained architectural states share access to executionresources. As can be seen, when certain resources are shared and othersare dedicated to an architectural state, the line between thenomenclature of a hardware thread and core overlaps. Yet often, a coreand a hardware thread are viewed by an operating system as individuallogical processors, where the operating system is able to individuallyschedule operations on each logical processor.

Physical processor 100, as illustrated in FIG. 1, includes two cores,core 101 and 102, which share access to higher level cache 110. Althoughprocessor 100 may include asymmetric cores, i.e. cores with differentconfigurations, functional units, and/or logic, symmetric cores areillustrated. As a result, core 102, which is illustrated as identical tocore 101, will not be discussed in detail to avoid repetitivediscussion. In addition, core 101 includes two hardware threads 101 aand 101 b, while core 102 includes two hardware threads 102 a and 102 b.Therefore, software entities, such as an operating system, potentiallyview processor 100 as four separate processors, i.e. four processorscapable of executing four software threads.

Here, a first thread is associated with architecture state registers 101a, a second thread is associated with architecture state registers 101b, a third thread is associated with architecture state registers 102 a,and a fourth thread is associated with architecture state registers 102b. As illustrated, architecture state registers 101 a are replicated inarchitecture state registers 101 b, so individual architecturestates/contexts are capable of being stored for logical processor 101 aand logical processor 101 b. Other smaller resources, such asinstruction pointers and renaming logic in rename allocater logic 130may also be replicated for threads 101 a and 101 b. Some resources, suchas re-order buffers in reorder/retirement unit 135, ILTB 120, load/storebuffers, and queues may be shared through partitioning. Other resources,such as general purpose internal registers, page-table base register,low-level data-cache and data-TLB 115, execution unit(s) 140, andportions of out-of-order unit 135 are potentially fully shared.

Processor 100 often includes other resources, which may be fully shared,shared through partitioning, or dedicated by/to processing elements. InFIG. 1, an embodiment of exemplary functional units/resources of aprocessor is illustrated. Note that a processor may include, or omit,any of these functional units, as well as include any known functionalunits, logic, or firmware not depicted.

As illustrated, processor 100 includes bus interface module 105 tocommunicate with devices external to processor 100, such as systemmemory 175, a chipset, a northbridge, or other integrated circuit.Memory 175 may be dedicated to processor 100 or shared with otherdevices in a system. Higher-level or further-out cache 110 is to cacherecently fetched elements from higher-level cache 110. Note thathigher-level or further-out refers to cache levels increasing or gettingfurther way from the execution unit(s). In one embodiment, higher-levelcache 110 is a second-level data cache. However, higher level cache 110is not so limited, as it may be associated with or include aninstruction cache. A trace cache, i.e. a type of instruction cache, mayinstead be coupled after decoder 125 to store recently decoded traces.Module 120 also potentially includes a branch target buffer to predictbranches to be executed/taken and an instruction-translation buffer(I-TLB) to store address translation entries for instructions.

Decode module 125 is coupled to fetch unit 120 to decode fetchedelements. In one embodiment, processor 100 is associated with anInstruction Set Architecture (ISA), which defines/specifies instructionsexecutable on processor 100. Here, often machine code instructionsrecognized by the ISA include a portion of the instruction referred toas an opcode, which references/specifies an instruction or operation tobe performed.

In one example, allocator and renamer block 130 includes an allocator toreserve resources, such as register files to store instructionprocessing results. However, threads 101 a and 101 b are potentiallycapable of out-of-order execution, where allocator and renamer block 130also reserves other resources, such as reorder buffers to trackinstruction results. Unit 130 may also include a register renamer torename program/instruction reference registers to other registersinternal to processor 100. Reorder/retirement unit 135 includescomponents, such as the reorder buffers mentioned above, load buffers,and store buffers, to support out-of-order execution and later in-orderretirement of instructions executed out-of-order.

Scheduler and execution unit(s) block 140, in one embodiment, includes ascheduler unit to schedule instructions/operation on execution units.For example, a floating point instruction is scheduled on a port of anexecution unit that has an available floating point execution unit.Register files associated with the execution units are also included tostore information instruction processing results. Exemplary executionunits include a floating point execution unit, an integer executionunit, a jump execution unit, a load execution unit, a store executionunit, and other known execution units.

Lower level data cache and data translation buffer (D-TLB) 150 arecoupled to execution unit(s) 140. The data cache is to store recentlyused/operated on elements, such as data operands, which are potentiallyheld in memory coherency states. The D-TLB is to store recentvirtual/linear to physical address translations. As a specific example,a processor may include a page table structure to break physical memoryinto a plurality of virtual pages.

In one embodiment, processor 100 is capable of transactional execution.A transaction, which may also be referred to as a critical or atomicsection of code, includes a grouping of instructions, operations, ormicro-operations to be executed as a group. For example, instructions oroperations may be used to demarcate a transaction or a critical section.Typically, during execution of a transaction, updates to memory are notmade globally visible until the transaction is committed. While thetransaction is still pending, locations loaded from and written towithin a memory are tracked. Upon successful validation of those memorylocations, the transaction is committed and updates made during thetransaction are made globally visible.

However, if the transaction is invalidated during its pendancy, thetransaction is restarted without making the updates globally visible. Asa result, pendency of a transaction, as used herein, refers to atransaction that has begun execution and has not been committed oraborted, i.e. pending. Example implementations for transactionalexecution include a Hardware Transactional Memory (HTM) system, aSoftware Transactional Memory (STM) system, and a combination thereof

A Software Transactional Memory (STM) system often refers to performingaccess tracking, conflict resolution, or other transactional memorytasks in or at least partially in software. As a general example, acompiler, when executed, compiles program code to insert calls to readand write barriers for transactional load and store operations,accordingly. A compiler may also insert other transactional andnon-transaction related operations, such as commit operations, abortoperations, bookkeeping operations, conflict detection operations, andstrong atomicity operations.

As stated above, in previous strong atomicity transactional systems,non-transactional memory access operations are treated as singletransactions. Here, a compiler inserts read and write barriers at everynon-transactional memory access to ensure strong atomicity, i.e. toisolate transactional memory accesses from non-transactional memoryaccesses. As an example, a call to a write barrier is inserted at everynon-transactional write operation. In this example, a provided writebarrier, when called, is to perform operations to ensure isolation, suchas performing a lock acquire operation/function to acquire a lock for amemory location and a lock release operation/function to release a lockfor a memory location. However, performing these barrier operations atevery non-transactional memory access is potentially expensive andunnecessary.

Therefore, in one embodiment, transactional barriers are dynamicallyoptimized for removal of unnecessary transactional barriers fromnon-transactional memory access operations, while providing strongisolation and atomicity guarantees. In one embodiment, it is dynamicallydetermined if a memory location associated with a non-transactionalmemory access operation may be conflictingly accessed within atransaction. In one embodiment, conflicting accesses include memoryaccesses where at least one of the memory accesses is a write, such as atransactional write to a memory location to be loaded by anon-transactional load operation. Here, a transactional read of a memorylocation to be read by a non-transactional operation does not constitutea conflict as neither accesses are updating the memory location.

If a compiler concludes that the non-transactional access cannotconflict with any transactional access, then it does not generates atransactional barrier at the non-transactional memory access. Here, byconvention of the program design, the non-transactional memory access isisolated from transactional memory accesses, i.e. the memory location isnot accessed in a transaction. As a consequence, no transactionalbarrier is inserted, which results in optimized barrier executionoverhead. Alternatively, if a non-transactional access may conflict witha transaction, then the appropriate transactional barriers are insertedto provide strong atomicity for avoiding incorrect execution. If later,a compiler encounters a transactional access that may conflict with thenon-transactional access for which no barrier was previously generated,a compiler will modify the generated code to contain the barrier (forexample, via patching).

In one embodiment, the dynamic analysis described above occurs duringruntime compilation of code, such as in a managed runtime environment.Therefore, different portions of code may be individually compiled onmultiple processing elements of processor 100. For example, a firstportion of code including a non-transactional load operation is compiledon core 101, while a second portion of the code including atransactional store operation that is to conflict with thenon-transactional load operation is being compiled on core 102.

Referring to FIG. 2, a simplified illustrative embodiment of a STMsystem is depicted. In one embodiment of an STM, transactional barriers,such as read and write barriers, are utilized to ensure data consistencyduring memory access operations. As above, these barriers, whenexecuted, are to perform similar transactional tasks, such as detectinginvalidating accesses. In other words, transactional barriers performbookkeeping to ensure isolation and data validity.

In one embodiment, memory locations, such as data object 201 held incache line 205, are associated with meta-data locations, such asmeta-data location 250 in array 240. Here, when a memory location, suchas data object 201 held in cache line 215 is unlocked, meta-datalocation 250 holds a version value, i.e. a version number of data object201.

In another embodiment, which is not depicted, alternative methods formapping meta-data to data elements or objects are utilized. For example,data element 201 potentially includes a data object with any number ofobject fields. Here, meta-data, such as location 250, is held in a fieldof the object or a header of the object. As an example, meta-data heldin a header of an object is utilized as meta-data for all the objectfields within the object. Therefore, although the description of FIG. 2primarily focuses on cache line conflict detection in a managedenvironment, such as C/C++, the methods and apparatus described hereinmay be utilized in any transactional memory system, such as in an objectbased conflict detection system in an unmanaged environment.

As an illustrative example, a read barrier logs a previous version value251 in read log 265 upon a load of data object 201. As a result, laterthis pervious version value may be utilized to determine if data object201 was updated during execution of the transaction, i.e. a currentversion value held in meta-data location 250 is different from previouslogged version value 251.

Continuing the example, when a memory location is owned, meta-datalocation 250 holds a locked value 252, such as a generic locked value ora pointer to a transaction descriptor indicating which transaction orprocessing element owns the memory location. Here, a write barrier mayacquire the lock before writing to the location, i.e. update meta-datalocation 250 to owned value 252. Note that the versioning scheme aboveis discussed in reference to an optimistic read STM, where lighterweight read barriers, i.e. version logging, are performed for reads, andmore extensive write-barrier operations, i.e. acquiring a lock, areperformed for writes. However, read and write barriers are not solimited, as writes may be performed more optimistically in different STMimplementations.

Also note that the aforementioned tasks to be performed by execution ofread and write barriers are purely illustrative, as any bookkeeping,versioning, or other task to be performed upon a read or write inassociation with transactional execution may constitute a transactionalbarrier. A non-exhaustive exemplary list of typical barrier tasksinclude: determining if a location is owned, acquiring a lock,performing a buffer related task, performing on-demand versionvalidation, and logging values. Furthermore, read and write barriers maybe updated and optimized to perform different or optimized tasks. As anexample, code may include a patch or call to a transactional barrier,which resides in a provided library. The provided library may be updatedto update the transactional barrier without affecting the original code.

In one embodiment, a dynamic not accessed in a transaction (D-NAIT)optimization is performed to optimize transactional barriers fornon-transactional memory accesses to provide efficient strong atomicity.Here, transactional memory accesses are performed utilizing read andwrite barriers, such that transactions are isolated from each other.Furthermore, non-transactional accesses that access memory locationsthat are accessed within transactions are also performed utilizing readand write barriers to isolate them from transactional accesses. However,non-transactional memory accesses to memory locations that are notaccessed in transactions are not performed utilizing read and writebarriers to reduce overhead without sacrificing strong atomicity.

The examples above includes one embodiment of implementing an STM;however, any known implementation of a transactional memory system maybe utilized in conjunction with dynamic optimization of transactionalbarriers for strong atomicity, such as an STM, an UnboundedTransactional Memory (UTM) system, a hybrid Transactional Memory system,such as a hardware accelerated STM (HASTM), or any other transactionalmemory system.

Referring next to FIG. 3, an embodiment of a flowchart for a method ofproviding optimized strong atomicity in transactional systems isillustrated. Note the flowcharts of FIGS. 3-5 are illustrated in asubstantially serial fashion. However, the methods illustrated by theseFigures are not so limited, as they may occur in any order, as well asbeing performed at least partially in parallel. For example, in FIG. 5,an access may be added to a list in an entry in block 545 beforeupdating a transaction access state of the entry in block 520.

Referencing FIG. 3, in block 305 it is dynamically determined if anon-transactional memory access operation may conflict with atransactional memory access operation. Previously, whole program staticanalysis may be utilized to analyze transactional code. In oneembodiment, dynamic analysis/determination includes analysis duringruntime compilation of code. For example, some dynamic loadinglanguages, such as Java™ from Sun Microsystems Inc, utilize runtimecompilation to load and execute portions of code. As a result, thesedynamic languages are capable of being abstracted over multiple types ofphysical hardware by compiling code at runtime to be interpretedcorrectly by hardware.

Although dynamic analysis is referred to above, in one embodiment, asanalysis during runtime compilation of code, dynamic analysis, inanother embodiment, includes any partial program analysis. Often, duringruntime, a dynamic language analyzes each method, such as via a linearscan, utilizing a recompilation infrastructure, as discussed above.However, in this discussion is inherent that only portions ofcode/programs are available to a dynamic language compiler duringcompilation. As a result, dynamic determination in one embodiment mayinclude non-runtime compilation partial program analysis.

During dynamic analysis, the compiler, such as a runtime dynamiclanguage compiler, is to determine if a non-transactional memory accessoperation may conflict with a transactional memory access operation. Inone embodiment, encountering any transactional memory access operationand non-transactional memory access operation that is to read from orwrite to the same data is determined to be conflicting accesses.

Here, note that a conflict may not be equivalent to an invalidatingaccess as commonly referred to in transactional memory; however,conflicting accesses may result in an invalidating access. Intransactional memory, an invalidating access often refers to an actualinvalidating access, such as a write to a memory location that is loadedfrom during a pendency of a transaction. In contrast, conflictingaccesses, in one embodiment discussed herein, refers to a potential foraccesses to be invalidating accesses. For example, during dynamiccompilation a non-transactional write to a memory location may isencountered and a transactional load from the memory location is alsoencountered, which results in conflicting accesses. However, duringactual execution, the non-transactional write may occur outside of thetransaction, which does not result in an invalidating access. Therefore,as can be seen, these conflicting accesses exhibit a potential for beinginvalidating; however, when they are executed they may not beinvalidating.

In some transactional memory systems, such as optimistic readconcurrency systems, reads/loads are performed optimistically, whilemore bookkeeping is performed in regards to writes and stores.Consequently, in one embodiment, a transactional load to read a memorylocation that is also to be read by a non-transactional load operationare not determined to be conflicting accesses. Here, since both accessoperations only read from a memory location, there is no potential forinvalid execution from incorrect data, as the memory location is notmodified by either memory access operation. However, as an example, ifeither of the non-transactional or transactional memory accessoperations to the same location includes a store operation, then theaccesses are to be considered conflicting.

In the discussion above, conflicting accesses were discussed inreference to the same memory location, or in other words, modificationof the same data. Therefore, in one embodiment, any known method ofdetermining whether a non-transactional memory access operation mayaccess the same data as a transactional memory access operation may beutilized for determining if accesses are determined to be conflicting.Examples of other terms or references that are associated with the samedata, or reference thereto, include a data element, a data object, adata reference, a field of a type of dynamic language code, a type ofdynamic language code, a memory address to hold data, and a memorylocation to hold data.

A few of the examples above, such as a field of a type of dynamiclanguage code and a type of dynamic language code refer to datastructures of dynamic language code. To illustrate, dynamic languagecode, such as Java™ from Sun Microsystems, Inc, is a strongly typedlanguage. Each variable has a type that is known at compile time. Thetypes are divided in two categories—primitive types (boolean andnumeric, e.g., int, float) and reference types (classes, interfaces andarrays). The values of reference types are references to objects. InJava™, an object, which consists of fields, may be a class instance oran array. Given object a of class A it is customary to use the notationA::x to refer to the field x of type A and a.x to the field x of objecta of class A. For example, an expression may be couched as a.x=a.y+a.z.Here, field y and field z are loaded to be added and the result is to bewritten to field x.

Therefore, conflict determination may be performed at any of data levelgranularity. For example, in one embodiment, a conflict is detected atthe type level. Here, a non-transactional write to a field A::x and atransactional load of field A::y is determined to be conflictinglyaccessed. In another embodiment, conflict determination/analysis isperformed at a field level granularity. Here, a non-transactional writeto A::x and a transactional load of A::y is not determined to beconflicting. Note, other data structures or programming techniques maybe taken into account in conflict analysis. As an example, assume thatfields x and y of object of class A, i.e. A::x and A::y, point toobjects of class B, are initialized to newly allocated objects, and arenever written to after initialization. In one embodiment, anon-transactional write to a field B::z of an object pointed to by A::xis not determined to be a conflicting access in regards to atransactional load of field B::z of an object pointed to by A::y.

In one embodiment, dynamic analysis of conflicting accesses is performedutilizing a memory access table. Here, a table is maintained duringdynamic analysis, such as during runtime, to track whether conflictingnon-transactional and transactional memory accesses are encountered. Inone embodiment, as program code is compiled, the table is indexed withdata referenced by encountered transactional and non-transactionalmemory accesses. Note that multiple data accesses may map to the sameentry in the table. Each table entry is associated with a transactionaccess state. When a transactional access to data is encountered, thetransaction access state of the appropriate data entry is updatedaccordingly.

In one embodiment, a table entry also hold a list of references tonon-transactional memory accesses for which a compiler generated nobarriers based on the transactional access state of the entry. This listof references is discussed in more detail below in reference to thediscussion of an access identifier (ID) field.

As an example, during compilation when a non-transactional write to A::xis encountered, an entry of the memory access table is updated with thedata reference, i.e. A::x, a default transaction access state of notaccessed, and a reference to the non-transactional write, such as astatement location and/or method the non-transactional write is includedwithin. Later, a transactional load of A::x is encountered. Thetransaction access entry associated with A::x of the table is updated toa read state to indicate that A::x is read in a transaction. Here, it isdetermined that the non-transactional write to A::x is to conflict witha transactional access, since the transaction access state includes aread state, i.e. a transaction is to load data that is to be written toby a non-transactional store.

Note that a conflict between transactional and non-transactionalaccesses may be detected either at a non-transactional memory access orat a transactional memory access, as discussed in the example above.Therefore, in one embodiment, determining a conflict exists, or apotential thereof, includes determining if a data element referenced bya non-transactional memory access is conflictingly accessed in atransaction upon encountering the non-transactional memory access.

As an example, upon encountering the non-transactional memory access,the table is searched utilizing the data element referenced by thenon-transactional memory access, and if an associated entry is found,then a transactional access state of the entry is checked. Here, if thetransactional access state indicates a potential conflict with thenon-transactional memory access, then an appropriate barrier isgenerated at the non-transactional memory access in block 315, which isdiscussed in more detail below. However, if no conflict is detected atthat time, then in one embodiment, no barrier is inserted at block 310.In another embodiment, a lightweight barrier or other space creatingoperations may be inserted instead of no barrier at thenon-transactional memory access. As a first example of space savingoperations, a no-op may be initially inserted at the non-transactionalmemory access to save space for later patching.

Furthermore, upon encountering a transactional memory access referencinga data element, a conflict may also be detected. Here, in oneembodiment, a transactional access state associated with the dataelement is updated in the table in response to encountering thetransactional memory access. As stated above, the table may include alist of references to related non-transactional memory access operationsthat were previously encountered with no barrier or a lightweightbarrier inserted. As a result, in one embodiment, where no barrier isinserted at related non-transactional memory access operations, code isrecompiled utilizing on-stack replacement to insert appropriate barriersin block 315. In another embodiment, where other operations orlightweight barriers were inserted, the code is patched in block 315.

Note that patching may include overwriting the non-transactional memoryaccess operation with a jump operation, when executed, to directexecution flow to a call to a transactional barrier and a copy of thenon-transactional memory access operation. Alternatively, lightweightbarriers or space saving no-ops may be overwritten with operations todirect the execution flow to a barrier. In one embodiment, whentransactional accesses are encountered that initiate a generation of abarrier at a previously encountered non-transactional operation, threadsare halted at a safe point, such as a point that does not overlap withthe operation being patched, and the patch is performed.

Although specific examples of patching and recompiling of code arediscussed, inserting a transactional barrier at a previously encounterednon-transactional access operation may be done by any known method.

In contrast to the discussion above, if no conflict is detected indecision block 305, then as illustrated in block 310, no transactionalbarrier is inserted or executed at the non-transactional memory accessoperation.

Turning to FIG. 4 an embodiment of a flowchart for a method ofoptimizing barriers for a strong atomicity transactional memory systemis illustrated. In block 405 a non-transactional memory accessoperation, such as a load or store operation, is encountered. In oneembodiment, by default, space may be reserved, as described above, forlater insertion of a transactional barrier.

In one embodiment, upon encountering the non-transactional memory accessoperation, an entry in a global table is updated to hold informationabout a data element referenced by the non-transactional memory accessoperation associated with a transactional access state. In oneembodiment, by default, when adding a new entry to the table, thetransaction access state is updated to a not-accessed state, i.e. thedata element is not accessed within a transaction. However, if the tablealready contains the entry for the data than the transactional accessstate of that entry is not modified.

Furthermore, if a transactional access to the data element isencountered before the non-transactional memory access, then an entrymay have already been updated/created by the transactional access. Here,the data element may be associated with a transaction access state ofread-only, i.e. the data element is read inside a transaction, orread-write, i.e. the data element is at least written to in atransaction. In this instance, an access identification (ID) field,which is to hold a list of references to non-transactional memoryaccesses to be updated in response to detecting a conflict, of the entrymay be updated to also reference the non-transactional memory access.However, the transactional access state is not modified.

In one embodiment, not every non-transactional memory access operationis considered for conflict detection. In other words, in the examplewhere a table is used for conflict analysis, an entry is not created forsome non-transactional memory access operations. For example,non-transactional memory accesses to local thread data or other localtemporary data elements that are not at risk from conflict with atransactional access, in this embodiment, are not considered forconflict detection. However, in other embodiment, this local data may beconsidered for conflict detection.

In block 410 a transactional access operation to access the data elementreferenced by the non-transactional memory access is encountered. Note,if no transactional access to the data element is encountered, then nobarrier is generated, as in block 430. In block 410, it is determined ifthe transactional access includes a transactional load or a store. Ifthe transactional access is a store operation, then the transactionaccess state is updated to a read-write state, i.e. the data element isat least written to in a transaction and may be read within thetransaction as well. Consequently, a compiler inserts the appropriateread or write barriers at previously encountered non-transactionalmemory accesses that potentially conflict with the transactional storein block 415. These previously encountered non-transactional memoryaccesses, as discussed above, may be held in an entry of a tableassociated with the data element referenced by the transactional memoryaccess.

Alternatively, if the transactional memory access includes a loadoperation, then, in one embodiment, barriers are inserted atnon-transactional store operations, but not at non-transactional loadoperations. Therefore, in block 420 it is determined if thenon-transactional memory access is a load or store. Here, a compilerinserts write barriers at non-transactional store operations thatreference the data element, but not at previously encounterednon-transactional read operations that reference the data element. Onceagain, a list of the previously encountered non-transactional read/storeoperations may be held in an entry of a table associated with the dataelement.

FIG. 5 a illustrates an embodiment of a flowchart for a method ofoptimizing barriers in a strong atomicity transactional memory systemutilizing a global memory access table. In one embodiment, a globalmemory access table is maintained during runtime compilation of code todetermine if non-transactional memory access operations should includetransactional barriers for strong atomicity.

In block 505 a non-transactional memory access to a data element isencountered. As an example, during runtime compilation of dynamiclanguage code, the non-transactional memory access is encountered, suchas during a linear or other scan of the code. As stated above, commonexamples of data elements for dynamic language code include a class, atype, an object and a field of an object. However, any granularity ofdata or memory location may be utilized as the herein referred to “dataelement” in determining potential conflicting accesses to the same dataelement.

In decision block 510, it is determined if an entry in a global accesstable for the data element already exists. For example, the globalaccess table may be searchable by a reference to the data element, i.e.the global table is indexed by the referenced data element. Note thatany known methods of indexing the table and searching the table may beutilized. If no entry already exists for the data element, then in block515 an entry is updated to hold a reference to the data element. Here,updating an entry may include creating an entry in the table. Also, anyknown reference to indicate a data element, such as a class, type,object, field, cache line, or other known data element may be utilized.

Furthermore, in one embodiment, a transaction access state associatedwith the data element in the entry is, by default, updated to a “notaccessed” state or value in block 520. Note that to reach block 520 notransactional or non-transactional access to the data element has beenencountered, as there is no entry in the table for the data element. Asa result, the state of the data element is set to “not accessed” inresponse to encountering the first access, which is a non-transactionalaccess, to the data element.

Also, in one embodiment, the entry is updated to include a reference tothe non-transactional memory access. Here, the entry potentially holds alist of previously encountered non-transactional memory accesses whereno barrier was inserted or a lightweight barrier was inserted. As aresult, when encountering the first access to a data element, which is anon-transactional access, the non-transactional access, in this case, isadded to the entry in case a conflicting transactional memory access islater encountered.

In one embodiment, a reference to the non-transactional memory accessincludes a reference that identifies the memory access individually. Inanother embodiment, the reference includes a reference to an inclusivestructure, such as a method that includes the non-transactional memoryaccess.

As alluded to above, in one embodiment, in response to determining afull read or write barrier is not to be inserted at a non-transactionalmemory access, no barrier is inserted at the access. Alternatively, inanother embodiment, in response to determining a full read or writebarrier is not to be inserted at a non-transactional memory access, alightweight barrier is inserted at the non-transactional memory access.Here, the lightweight barrier may later be disregarded, executed withless overhead than a full barrier, or patched/transitioned to a fullbarrier in response to encountering a conflicting transactional access,as discussed in more detail below.

Returning to decision block 510, when the table is searched, instead ofnot locating an entry and continuing on to block 515, in one embodiment,an entry is found that is associated with the data element. In otherwords, the global table is searched and an entry associated with thedata element exists. If an entry is located, then it is determined ifthe entry indicates a conflicting transactional access. In oneembodiment, which is not illustrated, if the entry indicates atransactional access to the data element has been detected, then aconflict is determined and appropriate barriers are inserted.

Alternatively, in another embodiment, insertion of a barrier depends ona transactional access state associated with the data element in thelocated entry of the table. As a result, in decision block 525, atransactional access state associated with the data element in thelocated entry is checked. Here, if the transactional access staterepresents a “read/write” state, i.e. the data element is written to andpotentially read within a transaction, then a conflict is detected.Consequently, the appropriate barrier is inserted in block 530. Forexample, if the non-transactional memory access includes a loadoperation, then a read barrier is generated at the non-transactionalmemory access. Alternatively, if the non-transactional memory accessincludes a store operation, then a write barrier is generated at thenon-transactional memory access

However, if the transactional access state represents a “read” state,i.e. the data element is read and not written to in a transaction, then,in one embodiment, insertion of a transactional barrier depends on thetype of non-transactional memory access. Therefore, in block 540 it isdetermined if the non-transactional memory access includes a storeoperation. If the non-transactional access includes a store, then awrite barrier is inserted/generated at block 530.

In contrast, if the non-transactional memory access includes a loadoperation, then a reference to the non-transactional memory access isappended to any references held in the entry. As stated above, insteadof generating no barrier, a lightweight barrier may be inserted here. Inother words, the current transactional access state is a read only,which does not conflict with a non-transactional load. However, later ifa transactional store is encountered and the access state is updated toread/write, then the reference to the non-transactional load operationis utilized to insert a barrier, as described below.

Returning to block 535, if the transactional access state does notinclude a read/write state or a read state, then it is determined thatthe transactional access state includes a not-accessed state. Here, nobarrier or a lightweight barrier is inserted and the reference is addedto the entry in block 545, as discussed above. Essentially, an entry hasbeen encountered in the not-accessed state, which typically means that aprevious non-transactional access has been encountered, but notransactional access has been encountered. Consequently, the referenceis appended in the entry in block 545 to ensure an appropriate barrieris inserted at the non-transactional access in response to subsequentlyencountering a conflicting transactional access.

Referring to FIG. 5 b, another embodiment of a flowchart for a method ofoptimizing barriers in a strong atomicity transactional memory systemutilizing a global memory access table is illustrated. A transactionalmemory access operation referencing a data element is encountered ordetected in block 550 in a similar manner to encountering thenon-transactional access in block 505 of FIG. 5 a. In one embodiment,the global access table is searched/checked, as described above, indecision block 555 to determine if an entry associated with thereferenced data element exists.

In one embodiment, if no entry exists, then an entry is created/updatedto hold the referenced data element and the appropriate transactionaccess state as described in blocks 560-570. In one embodiment, which isnot illustrated, the access state includes an access or not accessedstate. Here, a transactional access and a non-transactional access aredetermined to be potentially conflicting, and no delineation betweenstores and loads are made. However, in an alternate embodiment, thetransaction access state is updated to a read/write state if thetransactional access includes a transactional store operation, and thetransactional access state is updated to a read state if thetransactional access includes a read state. In either case, anidentifier field to hold a method or other reference to anon-transactional memory access may be left blank or unmodified.

Returning to decision block 555, if an entry does exists for the dataelement, then in decision block 570 it is determined if thetransactional memory access operation includes a load or storeoperation. If the transactional access includes a store operation, thenin block 595 the transaction access state of the entry is updated to aread-write state. In one embodiment, a read-write state is to indicatethat an associated data element or memory location is to be at leastwritten to in a transaction and may also be read in the transaction. Inresponse to the transaction state being updated to the read/write state,then read and write barriers are inserted at previously encounterednon-transactional memory access operations in block 590. Note fromabove, that these previously encountered non-transactional operationsare referenced in the entry.

In contrast, if the transactional access includes a load operation toread the data element, then at block 575 the transaction access state isupdated to a read only state. Here, the read only state indicates that adata element or memory location is to be read in a transaction, but notwritten to in a transaction. In decision block 580, after determiningthe transactional access includes a load operation, it is determined ifa referenced non-transactional memory access includes a load or a storeoperation. If the non-transactional access includes a store operation,then a write barrier is inserted at the non-transactional storeoperation in block 590. However, in one embodiment, if thenon-transactional access includes a load operation, then no barrier or alightweight barrier is inserted at block 585.

Whether inserting a write barrier at non-transactional store operationslisted in the entry in response to updating the transactional accessstate to a read state or inserting read and write barriers atnon-transactional load and store operations listed in the entry inresponse to updating the transactional access state to a read/writestate, inserting barriers at block 590 may be done by any known method.As a first example, when no access is inserted at a previouslyencountered non-transactional access listed in the entry, the code ispatched to insert an appropriate barrier. Patching, as described above,in an illustrative example includes overwriting the access operationwith a jump operation to direct flow to the barrier before performingthe access operation. As a second example, instead of patching, the codeis recompiled utilizing on-stack replacement to insert the barriers atthe access operation.

In another embodiment, where a lightweight barrier or other spacereserving operations are inserted at a previously encounterednon-transactional access listed in the entry, patching is utilized tooverwrite the lightweight barrier to a full barrier, i.e. transition ofthe lightweight barrier to a full barrier.

Therefore, whether a transactional or non-transactional memory access isencountered before the other, conflicting accesses may be detected andbarriers may be selectively inserted at non-transactional memoryoperations that may potentially conflict with transactional memoryoperations, either at compilation of the non-transactional operation orlater when the conflicting transactional access is detected.

Turning to FIG. 6, an illustrative embodiment of an access table tosupport optimization of transactional barriers for exemplary code isillustrated.

FIGURE A: Exemplary pseudo-code for dynamic compilation class A { voidfoo (A a1) { void bar (A a2) { int w; S1: int t0 = a1.w; S5: a2.x = 1;int x; S2: int t1 = a1.x; atomic { string y; S3: String t2 = a1.y;  S6:int t4 = a2.w; object z; S4: a1.z = t2;  S7: a2.y = “Hello”; }; }  S8:Object t5 = a2.z;  } }

As illustrated, table 605 includes an illustrative representation of aglobal memory access table after function foo from Figure A is compiled,while table 610 includes an illustrative representation of table 605after function bar from Figure A has been compiled. Note that table 605includes three fields: a data field, a transaction access state field,and an access ID field. As stated above, the data field may hold areference to a data element, memory location, or other common referencebetween memory access operations.

In one embodiment, the transaction access state is either not-accessedto indicate no access within a transaction or accessed to indicate thereferenced data is accessed in a transaction. In another embodiment,access states include not accessed (not accessed in a transaction), readonly (only read in a transaction), and read-write (at least written toin a transaction). Furthermore, as stated above, the access ID field mayhold a reference to individual non-transactional memory accesses or tomethods including non-transactional accesses that are associated withthe entries.

As an oversimplified illustrative example, the pseudo code of Figure Ais herein described in reference to FIG. 6. During dynamic compilationin this example, function foo is compiled before function bar. Uponencountering statement 1 (S1), entry 620 is created/updated to hold areference to the data element, i.e. A.w, the transaction access state isupdated to the default not-accessed state, and an access ID representedhere as S1 is updated in the access ID field. The reference S1 is anillustrative representation of a reference to a single non-transactionaloperation. In another embodiment, a reference to the method foo, i.e.the method including statement 1, is updated in the access ID field.

Similarly, through compilation of method foo, entries 621-623, areupdated in response to encountering statements S2, S3, and S4. Note, asno transactional access have been encountered, the transaction accessstate for each of the entries 620-623 is updated to not-accessed. Here,references to the individual non-transactional memory accesses are heldin each access ID field, which are associated with references to thedata to be accessed held in the data fields.

Next, the compiler begins to compile method bar and encounters statement5 (S5), a non-transactional write to field x of class A. Here, there isno transactional access to field x, so the transaction access state ofentry 621 is not modified. However, the access ID field is updated toalso include a reference to S5. As a result, upon encountering asubsequent transactional conflicting access, an accurate record ofaccesses (S2 and S5) and/or methods to patch are held in the access IDfield of entry 621.

Statement S6 of method bar is then encountered, which includes atransactional read of the field w of class/type A2. In other words, atransactional load operation from field w is encountered. The globaltable is searched and entry 620 is found, which corresponds to field wof class A. As the transactional access includes a load, the transactionaccess state of entry 620 is updated to a read-only state. Similarly, inresponse to statements S7 and S8, transaction access states of entries622 and 623 are updated to read-write and read-only, respectively, asstatement S7 includes a write to field A.y and statement S8 includes aread from field A2.z.

In one embodiment, a compiler enumerates through table 610 to determinewhich entries include accesses and/or methods to be patched. Entry 621includes a not-accessed access state, so no barriers are inserted atoperations S2 and S5. Entry 620 includes a read-only transaction accessstate, therefore, in one embodiment, write barriers are to be insertedat any referenced non-transactional store operations. Here, only a load,i.e. S1, is referenced in the access ID field, so no barrier is to beinserted. However, note if only the method foo were referenced in theaccess ID field, then write barriers would be inserted at the storeoperations, i.e. S4.

Similarly, entry 623 includes the read-only access state, so writebarriers are inserted at the non-transactional store operation S4.Furthermore, entry 622 includes a read-write access state. As a result,read and write barriers are to be appropriately inserted. Here, S3includes a load operation, so a read barrier is inserted accordingly.Note that insertion of barriers may be done by any known method, such asthe methods described above.

Therefore, as can be seen from above, barrier overhead for strongatomicity may be optimized. Previously, weakly atomic systems providedefficient execution, but potentially incurred risks of inaccurateexecution due to conflicts between non-transactional and transactionalmemory accesses. Furthermore, fully strong atomic systems withoutoptimization incur significant overhead at non-transactional accesses.As a result, during dynamic compilation barriers may be inserted atnon-transactional operations that potentially conflict withtransactional accesses, while barriers may also be omitted fromnon-transactional accesses that do not potentially conflict withtransactional accesses. This optimized removal/omission of barriersprovides the advantages of a strong atomic system, i.e. isolationbetween non-transactional and transactional memory accesses that requireisolation, while also providing the advantage of a weakly atomic system,i.e. efficient execution overhead for non-transactional accesses that donot require isolation.

A module as used herein refers to any hardware, software, firmware, or acombination thereof. Often module boundaries that are illustrated asseparate commonly vary and potentially overlap. For example, a first anda second module may share hardware, software, firmware, or a combinationthereof, while potentially retaining some independent hardware,software, or firmware. In one embodiment, use of the term logic includeshardware, such as transistors, registers, or other hardware, such asprogrammable logic devices. However, in another embodiment, logic alsoincludes software or code integrated with hardware, such as firmware ormicro-code.

A value, as used herein, includes any known representation of a number,a state, a logical state, or a binary logical state. Often, the use oflogic levels, logic values, or logical values is also referred to as 1'sand 0's, which simply represents binary logic states. For example, a 1refers to a high logic level and 0 refers to a low logic level. In oneembodiment, a storage cell, such as a transistor or flash cell, may becapable of holding a single logical value or multiple logical values.However, other representations of values in computer systems have beenused. For example the decimal number ten may also be represented as abinary value of 1010 and a hexadecimal letter A. Therefore, a valueincludes any representation of information capable of being held in acomputer system.

Moreover, states may be represented by values or portions of values. Asan example, a first value, such as a logical one, may represent adefault or initial state, while a second value, such as a logical zero,may represent a non-default state. In addition, the terms reset and set,in one embodiment, refer to a default and an updated value or state,respectively. For example, a default value potentially includes a highlogical value, i.e. reset, while an updated value potentially includes alow logical value, i.e. set. Note that any combination of values may beutilized to represent any number of states.

The embodiments of methods, hardware, software, firmware or code setforth above may be implemented via instructions or code stored on amachine-accessible or machine readable medium which are executable by aprocessing element. A machine-accessible/readable medium includes anymechanism that provides (i.e., stores and/or transmits) information in aform readable by a machine, such as a computer or electronic system. Forexample, a machine-accessible medium includes random-access memory(RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic oroptical storage medium; flash memory devices; electrical storage device,optical storage devices, acoustical storage devices or other form ofpropagated signal (e.g., carrier waves, infrared signals, digitalsignals) storage device; etc. For example, a machine may access astorage device through receiving a propagated signal, such as a carrierwave, from a medium capable of holding the information to be transmittedon the propagated signal.

Reference throughout this specification to “one embodiment” or “anembodiment” means that a particular feature, structure, orcharacteristic described in connection with the embodiment is includedin at least one embodiment of the present invention. Thus, theappearances of the phrases “in one embodiment” or “in an embodiment” invarious places throughout this specification are not necessarily allreferring to the same embodiment. Furthermore, the particular features,structures, or characteristics may be combined in any suitable manner inone or more embodiments.

In the foregoing specification, a detailed description has been givenwith reference to specific exemplary embodiments. It will, however, beevident that various modifications and changes may be made theretowithout departing from the broader spirit and scope of the invention asset forth in the appended claims. The specification and drawings are,accordingly, to be regarded in an illustrative sense rather than arestrictive sense. Furthermore, the foregoing use of embodiment andother exemplarily language does not necessarily refer to the sameembodiment or the same example, but may refer to different and distinctembodiments, as well as potentially the same embodiment.

1. A machine readable medium including compiler code which, whenexecuted by a machine, causes the machine to perform the operations of:determining a plurality of memory accesses in program code that cannotconflict with transactions utilizing not accessed in a transaction(NAIT) analysis; and compiling the plurality of memory accesses into aplurality of compiled memory accesses in response to determining theplurality of memory accesses in the program code that cannot conflictwith transactions utilizing not accessed in a transaction (NAIT)analysis, wherein compiled memory accesses, when executed, are to accessmemory without performing barriers to detect data conflicts.
 2. Themachine readable medium of claim 1, wherein determining a plurality ofmemory accesses in program code that cannot conflict with transactionsutilizing not accessed in a transaction (NAIT) analysis comprises:maintaining a table including an entry for each of the plurality ofmemory accesses comprising a reference to a data object associated witheach of the plurality of memory accesses; determining the plurality ofmemory accesses cannot conflict with transactions in response to eachentry for each of the plurality of memory accesses comprising a notaccessed in a transaction state.
 3. The machine readable medium of claim1, wherein determining a plurality of memory accesses in program codethat cannot conflict with transactions utilizing not accessed in atransaction (NAIT) analysis comprises: determining no transactionalmemory accesses reference a data object associated with each of theplurality of memory accesses.
 4. The machine readable medium of claim 1,wherein compiled memory accesses, when executed, are to access memorywithout performing barriers to detect data conflicts comprises: thecompiled memory accesses, when executed, are to access memory withoutperforming operations to ensure isolation.
 5. The machine readablemedium of claim 1, wherein compiling the plurality of memory accessesinto a plurality of compiled memory accesses in response to determiningthe plurality of memory accesses in the program code that cannotconflict with transactions utilizing not accessed in a transaction(NAIT) analysis, wherein compiled memory accesses, when executed, are toaccess memory without performing barriers to detect data conflictscomprises: compiling the plurality of memory accesses into a pluralityof direct memory accesses, which when executed, are to access memorywithout performing operations to ensure isolation.
 6. A machine readablemedium including compiler code which, when executed by a machine, causesthe machine to perform the operations of: determining a location that isnot accessed in a transaction utilizing Not Accessed in a Transaction(NAIT) analysis; determining an object that is read-only in atransaction; compiling a first memory access to the location into afirst compiled memory access, wherein the first compiled memoryaccesses, when executed, is to access the location without performingbarriers to detect data conflicts; and compiling a second memory accessto the object into a second compiled memory access, wherein the secondcompiled memory accesses, when executed, is to access the object withoutperforming barriers to detect data conflicts.
 7. The article ofmanufacture of claim 6, wherein determining a location that is notaccessed in a transaction utilizing Not Accessed in a Transaction (NAIT)analysis comprises: determining the location is not accessed in atransaction in response to a table including a reference to the locationholding a not accessed in a transaction state responsive to no potentialconflicting transactional memory access operation referencing thelocation.
 8. The article of manufacture of claim 7, wherein determiningan object that is read-only in a transaction; comprises: determining theobject is read only in a transaction in response to the table includinga reference to the object holding a read only transactional stateresponsive to a transactional memory read access operation referencingthe object.
 9. The article of manufacture of claim 6, wherein compilinga first memory access to the location into a first compiled memoryaccess, wherein the first compiled memory accesses, when executed, is toaccess the location without performing barriers to detect data conflictscomprises: compiling the first memory access to the location into adirect memory access to the location, wherein direct memory access, whenexecuted, is to access the location without performing operations toensure isolation.
 10. The article of manufacture of claim 6, whereincompiling a second memory access to the object into a second compiledmemory access, wherein the second compiled memory accesses, whenexecuted, is to access the object without performing barriers to detectdata conflicts comprises: compiling the second memory access to theobject into a direct memory access to the object, wherein direct memoryaccess, when executed, is to access the object without performingoperations to ensure isolation.